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Abstract 

The assignment problem takes as input two finite point sets S and T 
and establishes a correspondence between points in 5* and points in T, 
such that each point in S maps to exactly one point in T, and each point 
in T maps to at least one point in S. In this paper we show that this 
problem has an 0(nlogn)-time solution, provided that the points in S 
and T are restricted to lie on a line (linear time, if S and T are presorted). 

1 Introduction 

Consider two finite sets of points S and T with total cardinahty n. The ob- 
jective of the assignment problem is to establish a correspondence between the 
points in S and the points in T, such that each point in S corresponds to ex- 
actly one point in T, and each point in T corresponds to at least one point in 
S. This correspondence is measured by a cost function 6 that assigns a cost 
6{s,t) to each assigned pair (s,t). The cost of an assignment is the sum of the 
costs of all assigned pairs. The goal of the assignment problem is to find an 
assignment of minimum cost. 

The general assignment problem is also known as the many-to-one assign- 
ment problem. The one-to-one version of the assignment problem requires 
that each point in S maps to exactly one point in T and each point in T gets 
mapped exactly one point in S. Throughout the paper, whenever we talk about 
the assignment problem, we refer to the many-to-one version of the problem. 

The simplest version of the assignment problem assumes that the points in 
S and T lie on a line and the cost function is the Li metric. In this setting, 
the one-to-one assignment problem has a simple 0(n log n) time solution when 
15"! = |T|: first sort the points in O(nlogn) time, then map the A;*'^ point in S 
to the k^^ point in T in 0(n) time [Jj. However, the situation IS"! < \T\ arises 



1 



in many practical applications. This situation was first addressed by Karp and 
Li [Hj, who provided an 0{n log n) time algorithm for the one-to-one assignment 
problem (0(n) time, if S and T are given in sorted order). Simpler and equally 
efficient solutions have later been provided in El El • 

Eiter and MannilafS] studied the assignment problem in the context of mea- 
suring the distance between two theories expressed in a logical language. They 
showed that for points in arbitrary dimensions, this problem has a polynomial 
time solution. When restricted to points on a line, a minimum cost assignment 
can be used in measuring the similarity between musical rhythms. In this con- 
text, Toussaint [8; proposed the use of the directed swap distance as a similarity 
measure. If the onsets of a rhythm are represented as points on a line separated 
by "silence" intervals, the directed swap distance between two rhythms with 
onset sets S and T is precisely the cost of an optimal assignment between S 
and T, with underlying cost function Li. 

The assignment problem also appears in the shape of the restriction scajfold 
assignment problem in computational biology 2j . The goal here is to establish 
a correspondence between sparse experimental data and a restricted set of 
known structural building blocks. Ben-Dor et. al. 2^ model the restriction 
scaffold assignment as an assignment problem for points on a line, and provide 
an 0(n log n) time algorithm to solve this problem. However, as later shown by 
Colannino and Toussaint 4_ , this algorithm fails to always produce a minimum 
cost assignment. Thus, the best existing solution to the assignment problem in 
one dimension is the 0{rP) algorithm presented in jj]. 

In this paper, we show that the assignment problem with underlying cost 
function Li in one dimension can be solved in 0(n log n) time {0{n) if the points 
in S and T are given in sorted order). Our algorithm is a simple extension of 
the O(nlogn) time algorithm of Karp and Li ^ for finding the minimum cost 
one-to-one assignment over T and all subsets S' G S oi size |T|, assuming 
\S\ > \T\. We present our algorithm in Section ^ after a few preliminary 
results (Section |2.1|) and a close look at some properties of an optimal solution 
(Section ini). 

2 Background 

Let S = {sq, Si, S2, ■ ■ ■} and T = {to, ti,t2, . . .} be two finite sets of points that 
lie on a horizontal line, with IS"! -|- |T| = n and |5| > |T|. For any s € S and t G 
T, the cost 6{s, t) of an assigned pair (s, t) is the absolute value of the difference 
between the x-coordinates of s and t. To avoid overloading the notation, we 
use the same symbol for a point and its x-coordinate. Thus, 6{s,t) = \s — t\. 
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We assume that Si < Si+i, < i < \S\ — 1 and tj < tj+i, < j < \T\ — 1. 

An assignment A between S and T consists of pairs of points {s,t) (hence- 
forth edges), with s S and t T, such that each point in S belongs to exactly 
one edge in A, and each point in T belongs to at least one edge in A. The cost 
of A is 

cost{A) = ^ <5(s,t) 

{s,t)eA 

Our goal is to find an assignment A of minimum cost. If two points in S U T 
have the same x-coordinate, we can slightly shift one of them to the left or 
right. If the minimum cost assignment is unique and the change is sufficiently 
small, this change will not affect the optimal assignment. If there are several 
assignments with the same optimal cost, at least one of them will be the optimal 
solution of the new point set. So we may assume without loss of generality that 
all points in 5 U T are distinct. 

2.1 Preliminaries 

For any s (z S and t T, the value |s — i| can be expressed in a different way 
as follows. Define a function fs^t to be 1 in the interval between s and t and 
at any other point (see Figure^. Then |s — t| = fs^t{x)dx. 

y=l 1 1 



Figure 1: Function fs^t- Shaded area represents the cost \s — t\. 
The cost of an assignment A is therefore 

/+00 f + OO 

fs,t{x)dx = ^ fs^t{x)dx 

If we define 

fAix)= Yl /m(^) 

then the value /^(a) is simply the number of edges in A pierced by the vertical 
line x = a, and the cost of A is 

/ + 00 
fA{x)dx (1) 
-oo 
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Our definition of /^^ is similar in nature to tlie height function H : M ^ 7j 
introduced by Karp and Li jB]. Informally, they define H{a) at each point a as 
the difference between the number of points in 5 and the number of points in 
T restricted to the interval (—00, a] (or equivalently, to the left of the vertical 
line X = a). Thus H remains constant throughout each interval that does not 
contain a point in S U T. Figure |21 shows the stair-shaped curve of H for a 
small example. Note that up transitions in the curve correspond to points in 
S and down transitions correspond to points in T. We refer to the value H{x) 
as the height of x. Note that H{oo) = \S\ — \T\. 




Figure 2: Height function for sets S = {0,3,4,6,13,14,15,16} and T = 
{1,2,8,10,11,12}. 

Lemma 1 If \S\ = \T\, then \H{x)\ dx is the cost of the assignment that 
assigns the /c*^ largest element of S to the k^^ largest element ofT. 

Proof: Follows immediately from and the fact that, for this particular 
assignment, /^(x) = \H{x)\ at each point x. □ 



4 6 13 14 16 




Figure 3: (a) One-to-one assignment for sets S = {0, 4, 6, 13, 14, 16} and T = 
{1, 2, 8, 10, 11, 12} (b) Shaded area represents the cost of the assignment. 

Figure ISt- shows an assignment for two sets S and T, with IS*! = \T\. The cost 
of this assignment is equal to the area shaded in Figure Eb, which is precisely 
the value of the integral \H{x)\ dx. 



4 



3 Properties of a Minimum Cost Assignment 



Our algorithm for computing a minimum cost assignment A exploits several 
important properties of A, which we discuss next. A crossing is defined by a 
pair of edges (a, d) and (6, c) such that a < b in S and c < d inT. 

Lemma 2 There exists a minimum cost assignment with no crossings. 

Proof: Let ^ be a minimum cost assignment between S and T with a minimum 
number of crossings. If A has zero crossings, the proof is finished. Otherwise, 
pick two crossing edges (a, d) and (b, c) in A, with a < b in S and c < d in 
T. We show that A' = A\{{a, d), {b, c)} U {(a, c), (6, d)} is an assignment with 
cost{A') < cost{A), a contradiction. In particular, we show that fA'i^) < 
/^(x) at each point x; then cost{A') < cost{A) follows immediately from 

First note that f_A'{x) < is true for any x such that the vertical line L 

at X intersects neither of (a, d) and (6, c). Suppose now that L intersects (o, c). 
Then L must also intersect either {a,d) (see Figure H^) or (6, c) (see Figure HJ)) 
or both (see Figure Et)- Similarly, if L intersects {b,d), then L also intersects 
at least one of {a,d) and (6, c). Furthermore, if L intersects both (a, c) and 
{b,d), then L also intersects both {a,d) and (b,c) (see Figure It follows 
that /^/(x) < /^(x). □ 






(b) 

Figure 4: (a) Vertical line L intersects (a, c) and (a, d) (b) L intersects (a, c) 
and {b,c) (c) L intersects (a,c), (6, d), (a, d) and (6, c). 

An assignment A can also be regarded as a function A : S ^ T such that 
^(s) = t for each (s,i) G A. For any t ^ T, let ^~^(t) denote the set of 
elements s G S such that ^(s) = t. For each point s G 5, define the nearest 
neighbor N{s) to be point in T closest to s, i.e, |A^(s) — s| < \t — s\ for any t gT. 
In the case of a tie, N{s) is arbitrarily picked from among the two candidate 
neighbors. 

Lemma 3 Let A be optimal and let t G T be such that A^^{t) contains two 
or more elements. Then for each s G A^^{t), t is a nearest neighbor of s. 
Furthermore, T contains no points in between s and t. 
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Proof: Assume to the contrary that there is s £ S with A{s) = t, \ A^^{t)\ > 1, 
and N{s) 7^ t. Refer to FigureEl Define a new assignment A' with A'{s) = N{s) 
and A'{x) = A{x) for x ^ s. Note that A' is also an assignment: A~^{t) 
contains at least one point. Also cost{A') = cost{A) — \s — t\ + \s — N{s)\ (see 
Figm'eslHK andlSJa). Since |s — A^(s)| < |s— 1|, it follows that cost(A') < cost{A), 



Figure 5: (a) Assignment A with A{s) 7^ -/V(s) (b) Assignment A' with A'{s) = 

N{s) 

contradicting the fact that A is of minimum cost. Thus, t is a nearest neighbor 
of s. 

The claim that T contains no points in between s and t is immediate: if 
such a point ti S T existed, then |s — ti| < |s — t|, contradicting the fact that 
N{s)=t. □ 

Observe that for any subset i? C S* of size \R\ = \S\ — \T\, there is a unique 
minimum cost assignment (with no crossings) from S\R to T. Let As\r denote 
the edges of such an assignment, and define a new assignment Ar : S ^ T as 
follows: 

[y if X € 5 \ i? and (x, y) e As\r 

Lemma 131 implies that there always exists a subset R such that Ar defines a 
minimum cost assignment from S to T. Furthermore, R satisfy a special height 
condition, stated in the lemma below. 

Lemma 4 There exists a subset R C S with \R\ = \S\ — \T\ such that Ar 
defines a minimum cost assignment from S to T, and the k^^ smallest element 
of R has height k. 

Proof: Let A : S ^ T define a minimum cost assignment. We prove the 
existence of Ar by constructing a set C with the properties stated in this 
lemma. Initially R is empty. If |^~^(t)| = 1 for all t then R is empty and 
the proof is finished. Otherwise, we process points t € T for which A~^{t) has 
two or more elements. For each such point we consider two cases, as depicted 
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in Figure ini If all points in A^^{t) are less than t, then we add to R all but 
the largest (rightmost) point in A~^{t) (see Figure EK)- Otherwise, we add to 
R all points in A~^{t) except for the smallest (leftmost) point greater than t 
(see Figure Eb)- 



Figure 6: (a) All points in A ^{t) are less than t. (b) Some points in A ^{t) 
are greater than t. 

We now define Aji as in @. Since Ar is identical to A, Ar is a minimum 
cost many-to-one assignment from 5 to T. 

It remains to show that the k^^ smallest element of R has height k. To see 
this, first consider the smallest element of a nonempty set A~^{t) n R. Call 
this element r and suppose it is the k^^ smallest element of R. It follows then 
that (i) R contains k — 1 points less than r, and (ii) T and S \ R contain an 
equal number of elements less than r. This latter claim follows from Lemma 13 
which tells us that T contains no elements in between r and t, and the following 
observation: the way in which we have selected R ensures that if t lies to the 
left of r (i.e., t < r), the assigned item for t in S/R lies to the left of r, and if 
t lies to the right of r (t > r), the assigned item for t in S/R lies to the right 
of r. These together imply that H{r) = k. 

We now show that the points in A~^{t) \ {r} have height values k + l,k + 
2,..., in order from smallest to largest. By Lemma |31 T contains no points 
in between s and t, for each s G A^^{t). Then the points in i? n A^^{t) have 
incrementally increasing height values. It follows that the height of the A;*'* 
smallest element oi R is k. □ 

Let Hr represent the height function restricted to sets S\R and T. This means 
that for each x, Hr[x) is the difference between the number of points m. S\R 
and the number of points in T restricted to the interval (— cx),x]. 

Lemma 5 The cost of assignment Ar is 



Insert in R 



Insert in R 





(3) 
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Proof: By Lemma ^ we have that the contribution of 5 \ i? to the cost of 
Ar is \Hji{x)\dx. Since each point in R maps to its nearest neighbor, 
the contribution of R to the cost of Ar is X^^g/j \r — N(r)\. These together 
conclude the lemma. □ 

Theorem 6 Let R C S be a subset of size \R\ = \S\ — \T\ with two properties: 

i. The k^^ smallest element of R has height k. 

ii. R minimizes the quantity from 

Then Ar defines a minimum cost assignment from S to T. 

Proof: By Lemma ^ we know that there exists a set R that satisfies (i). By 
Lemma [SI R satisfies (ii). It follows that Ar is a minimum cost assignment 
from S to T. □ 



4 Computing a Minimum Cost Assignment 

Theorem 1^1 gives an exact description of the set R that yields a minimum cost 
assignment Ar. We now turn to the problem of efficiently determining this set. 
With this goal in mind, we introduce the following notation. For any point x 
and any integer k, define the relative height of x with respect to k as 

^ ^ [ -1, if H{x) < k 

Observe that when a point s is removed from S, H{x) decreases by 1 for all 
x > s. Suppose that H{s) = k, and let m be the largest point in 5 U T. The 
removal of s causes the area under the height function between s and m to 
decrease by the quantity h^{x)dx. We use this observation to define the 
profit of removing s from S and placing it in R (recall that Ar assigns each 
item in R to its nearest neighbor), as follows: 

P{s)= h''{x)dx-\s- N{,s)\ (4) 

J s 

The profit function quantifies the benefit of placing s in R, the goal being 
to minimize the cost of the assignment defined by Ar. The integral term in (HJ 
represents the effect of excluding s from the one-to-one assignment from S\R 
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Figure 7: A depiction of the integral h^{x)dx for s = 4. The integral 
represents the effect of excluding 4 from the one-to-one assignment from S to 
T. 



to T, as depicted in Figure [3 The term \s — N{s)\ in Q) represents the cost 
of assigning s to its nearest neighbor. We minimize the cost of the assignment 
defined by Ar by choosing items s that maximize -P(s). This is formalized in 
the following lemma. 

Lemma 7 Let R C S be a set with elements ri < r2 . . . < ri^i.i-^i such that 
H{rk) = k and maximizes P{s) among all points s € S of height k. Then R 
minimizes 

r+oo 

Y^\r-N{r)\ + / \HR{x)\dx 



Proof: Karp and Li proved that any set R of size |5| — \T\ whose k*"^ 
smallest element has height k satisfies the equality 



\HR{x)\dx= / \H{x)\dx-y] / 

•^0 „^ D Jr 



+ 00 

h^{x)dx 

reR' 



Summing up the cost contribution of R to both sides of the equality yields 

r>+oo I'm I'm 



h^{x)dx 



\HR{x)\dx = \r-N{r)\+ / \H{x)\dx-Y^ i 
This is equivalent to 

/+00 /•m 
\HR{x)\dx= I \H{x)\dx-YP{r) 

Since P{rk) is maximized at each height k and there is only one element in R 
at each height, we have that R maximizes '^j.^r Pi^), which in turn minimizes 



/+00 
\HR{x)\dx 
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as required (refer to Lemma ISJ. □ 
The following algorithm uses the preceding lemma to determine the optimal 
set R, and then compute the minimum cost assignment. 



4.1 The Assignment Algorithm 

Initially R is the empty set. 

1. Sort S and T. 

2. Calculate H{x) for each a; £ S U T. In between consecutive points, H is 



3. Calculate P{s) for each s £ S. 

4. For A; = 1,2, ... |5| - |T| 

4.1 Find the leftmost point of height k that maximizes P{rj^). 

4.2 Add rfc to R. 

5. Return Ar. 

Lemma 8 The assignment algorithm computes a minimum cost assignment 
from S to T. 

Proof: Let rj, be the element of R of height k returned by the algorithm. If 
we show that ri < r2 < . . . < r|5|_|j^|, then it follows by Lemma [7| that Ar 
is a minimum cost assignment. We prove below, by contradiction, that indeed 
n < r2 < ... < r\s\-\T\- 

Let m be the largest point in S. Assume that there exists some k{l < k < 
\S\ — \T\ — 1) for which the algorithm returns and r/^^i, with > r^+i. Let 
Sk be the maximal element at height k in S \ R which is less than r^-fi. By 
continuity, such an must exist. Similarly, let s^+i be the minimal element 
at height k + 1 in S\R which is greater than r^. Such an Sk+i must exist since 
the height at oo is H{oo) = \S\ — \T\. Refer to Figure |H1 



Figure 8: Sk{sk+i) is the closest point at height k{k + 1) to the left (right) of 
rk+i{rk). 



constant. 




10 



Since H{rk+i) = H{sk+i) and rk+i < Sfc+i, we have that 

m rsk+i f-m 

h''+\x)dx= / h''+\x)dx+ / h''+\x)dx 

rk+i ■J^k+i Jsk+i 

From this and equation Q, we can derive the following relation between the 
profit functions of r^+i and Sk+i- 

P{rk+i) = P{sk+i)+ I h''+\x)dx-\n+i-N{n+i)\ + \sk+i-N{sk+i)\ (5) 

Note that equality Q is the result of breaking up the integral corresponding to 
P(rfc+i) into two parts, and taking into account the distance from each element 
to its nearest neighbor. Similarly, we can derive the following relation between 
P{n) and P(sfc): 

l">'k 

P{sk)=P{rk)+ h\x)dx-\sk-N{sk)\ + \rk-N{rk)\ (6) 

The nearest neighbor of Sk cannot be farther than N(rk+i)- This translates 
into: 

\sk - N{sk)\ < \rk+i - N{rk+i)\ + \sk - rk+i\ 

Also note that h^{x) is positive on the interval (s^, rfc+i), which allows us to 
rewrite the previous equation as: 

\sk-N{sk)\<\rk+i-N{rk+i)\+ h\x)dx (7) 

Similar arguments lead to the following relationship between nearest neighbors 
of rk and Sk+i- 

\rk - N{rk)\ > \sk+i - N{sk+i)\ + / h^+\x)dx (8) 



Finally, on the interval {rk+i,rk) note that 

h''+\x)dx< / h\x)dx (9) 

rk+l 'Jrk+i 

Let Mk = \sk—N{sk)\ — \rk—N{rk)\. Simple arithmetic that involves inequalities 
0, © and © yields 

' h''{x)dx -Mk> I h^+^{x)dx + Mk+i 
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This along with (jSJ and © implies that 

P{sk) - P{rk) > Pirk+i) - P{sk+i) 

Since r^+i was picked by the assignment algorithm, we have that P(rfc+i) > 
P(sfc+i). This implies that P{sk) > P{rk), but since Sk lies to the left of r^, 
the assignment algorithm would have picked instead of r^, a contradiction. 
□ 

4.2 Complexity Analysis 

Sorting in step 1 takes 0(n log n) time. All other steps run in 0{n) time. The 
only steps where this is not obvious are steps 2 and 3 that involve computing 
H{x) and P{x) respectively. H{x) can be computed for all s € S" by conducting 
a sweep of the sorted points in SUT, adding one when we encounter an element 
of 5 and subtracting one when we encounter an element of T. 

Since all nearest neighbors of the elements of S can easily be computed in 
linear time, to show that we can compute the profit function for all elements 
of 5* in linear time we concern ourselves only with computing the integral of 
relative height function h^. This integral can be computed in linear time for all 
points in S at height A; in a sweep from right to left. For the rightmost element 
Sr of S at height k J™ h^{x)dx = \sr — 'm\, where m is the largest point in S. 
Suppose that we know h^{x)dx for some item s at height k. Let s' < s be 
the largest element in S also at height /c, and let t < s be the largest element 
in T at height k. Note that by continuity, t exists and must be greater than s' . 
Also note that h^{x) is positive for all s' < x < t, and h^{x) is negative for all 
t < X < s. Thus we can derive the following equation: 

/•m rra 

/ h^{x)dx= h^{x)dx + \s' -t\-\t- s\ (10) 

Js' Js 

This value can be computed in constant time for each s' G S. Thus we can 
compute -P(s) for all s G in linear time. 

It follows that the assignment algorithm runs in 0(n log n) time. Further- 
more, if S and T are given in sorted order, the assignment algorithm runs in 
0{n) time. 

5 Conclusion 

We have shown that the one-to-one assignment algorithm in |^ can be extended 
to produce a minimum cost many-to-one assignment. The algorithm runs in 
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0(n log n) time, if the input points are given in arbitrary order, and in 0(n) 
time, if the input points are presorted. To our knowledge, this is the first 
solution to the assignment problem that achieves this time complexity. 
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